Homicide Report USA, 1980-2014

My Tran, Alex Park, Esmay Muniz

2022-11-30

Introduction!

Link here for the murder data set

This dataset was Homicide Reports information. Through our analyzes we will find approaches to our findings. Our data will give a visualization and identify correlations within these murders throughout the year of 1980-2014.

Our Questions

Domain Question What factors are related to motives and behaviors of the killers?

Other questions

what weapon was used the most?

what state had the most kills?

do the victims know their perpetrator?

How does gender play a role in homicide incidents?

Get ready

library(tibble) # used to create tibbles
library(tidyr) # used to tidy up data
library(rmarkdown) # dynamic document 
library(ggplot2) # used for data visualization
library(dplyr) # used for data manipulation
library(shiny) # used for showing dynamic visuals in collaboration with ggvis
library(prettydoc)# used for creating pretty documents from R markdown
library(knitr)#for dynamic report generation
library(tidyverse)# multiple tidy up data packages here
library(hms) # used to install kableExtra package
library(kableExtra) # used to construct Complex Table for data
library(dplyr) # used to install tigris package
library(tigris) # used to make states map
#added library for other graphs
library(plotly)
library(rjson)
library(leaflet)
library(leaflet.providers)
library(maps)
library(viridis)
library(viridisLite)
library(sp)
library(quantmod)
library(plot3D)
library(sf)
library(RColorBrewer)
library(gganimate)

Peering into the Unfiltered Murder data Set

Original dataset we have is from Kaggle, “Homicide Report”. Firstly, we take a look at data.

unzip(zipfile="./homicide.zip")
data <- read.csv("database.csv")
glimpse(data) 
## Rows: 638,454
## Columns: 24
## $ Record.ID             <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 1…
## $ Agency.Code           <chr> "AK00101", "AK00101", "AK00101", "AK00101", "AK0…
## $ Agency.Name           <chr> "Anchorage", "Anchorage", "Anchorage", "Anchorag…
## $ Agency.Type           <chr> "Municipal Police", "Municipal Police", "Municip…
## $ City                  <chr> "Anchorage", "Anchorage", "Anchorage", "Anchorag…
## $ State                 <chr> "Alaska", "Alaska", "Alaska", "Alaska", "Alaska"…
## $ Year                  <int> 1980, 1980, 1980, 1980, 1980, 1980, 1980, 1980, …
## $ Month                 <chr> "January", "March", "March", "April", "April", "…
## $ Incident              <int> 1, 1, 2, 1, 2, 1, 2, 1, 2, 3, 1, 2, 3, 1, 2, 3, …
## $ Crime.Type            <chr> "Murder or Manslaughter", "Murder or Manslaughte…
## $ Crime.Solved          <chr> "Yes", "Yes", "No", "Yes", "No", "Yes", "Yes", "…
## $ Victim.Sex            <chr> "Male", "Male", "Female", "Male", "Female", "Mal…
## $ Victim.Age            <int> 14, 43, 30, 43, 30, 30, 42, 99, 32, 38, 36, 20, …
## $ Victim.Race           <chr> "Native American/Alaska Native", "White", "Nativ…
## $ Victim.Ethnicity      <chr> "Unknown", "Unknown", "Unknown", "Unknown", "Unk…
## $ Perpetrator.Sex       <chr> "Male", "Male", "Unknown", "Male", "Unknown", "M…
## $ Perpetrator.Age       <int> 15, 42, 0, 42, 0, 36, 27, 35, 0, 40, 0, 49, 39, …
## $ Perpetrator.Race      <chr> "Native American/Alaska Native", "White", "Unkno…
## $ Perpetrator.Ethnicity <chr> "Unknown", "Unknown", "Unknown", "Unknown", "Unk…
## $ Relationship          <chr> "Acquaintance", "Acquaintance", "Unknown", "Acqu…
## $ Weapon                <chr> "Blunt Object", "Strangulation", "Unknown", "Str…
## $ Victim.Count          <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ Perpetrator.Count     <int> 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, …
## $ Record.Source         <chr> "FBI", "FBI", "FBI", "FBI", "FBI", "FBI", "FBI",…

Data Key Terms:

Agency Type: Law enforcement Agency who handled the case

State/City: State and Counties of the reported homicides

Year/Month: Time stamp of the homicides

Crime Type: Murder, Manslaughter or Negligence designated to case

Crime Solved: Whether the case has been solved or not

Victim Sex/Age/Race: Victim profile

Perpetrator Sex/Age/Race: Perpetrator profile

Relationship: The perpetrators relation to the victim

Weapon: Weapon used to commit homicide

Case Open/Closed: Change the designation of a crime being solved.

Solve Rate: Percentage of Homicide Reports where the case was closed

Top murder cases by state, we taking the raw dataset to display the top murder cases by State.

data %>% group_by(State) %>% 
  summarize(Murder_Count = n()) %>% 
  arrange(desc(Murder_Count)) %>% 
  kbl() %>% kable_paper()  %>% scroll_box(height = "800px")
State Murder_Count
California 99783
Texas 62095
New York 49268
Florida 37164
Michigan 28448
Illinois 25871
Pennsylvania 24236
Georgia 21088
North Carolina 20390
Louisiana 19629
Ohio 19158
Maryland 17312
Virginia 15520
Tennessee 14930
Missouri 14832
New Jersey 14132
Arizona 12871
South Carolina 11698
Indiana 11463
Alabama 11376
Oklahoma 8809
Washington 7815
District of Columbia 7115
Arkansas 6947
Colorado 6593
Kentucky 6554
Mississippi 6546
Wisconsin 6191
Massachusetts 6036
Nevada 5553
Connecticut 4896
New Mexico 4272
Oregon 4217
Minnesota 3975
Kansas 3085
West Virginia 3061
Utah 2033
Iowa 1749
Alaska 1617
Hawaii 1338
Nebraska 1331
Rhodes Island 1211
Delaware 1179
Idaho 1150
Maine 869
New Hampshire 655
Wyoming 630
Montana 601
South Dakota 442
Vermont 412
North Dakota 308

In table and graph, they show us that the state with the most kills in which California is the state with the most murder cases with Texas coming in second and New York third.

Top 10 States with the highest amount of cases.

## # A tibble: 10 × 2
##    State          Murder_Count
##    <chr>                 <int>
##  1 California            99783
##  2 Texas                 62095
##  3 New York              49268
##  4 Florida               37164
##  5 Michigan              28448
##  6 Illinois              25871
##  7 Pennsylvania          24236
##  8 Georgia               21088
##  9 North Carolina        20390
## 10 Louisiana             19629

What is the most used weapon?, we count how many times killers used each kind of weapons to see the top of their weapon choice.

data %>% group_by(Weapon) %>% 
  summarize(Most_Weapon_Used = n()) %>% 
  arrange(desc(Most_Weapon_Used)) %>% 
  kbl() %>% kable_paper() %>% scroll_box(height = "800px")
Weapon Most_Weapon_Used
Handgun 317484
Knife 94962
Blunt Object 67337
Firearm 46980
Unknown 33192
Shotgun 30722
Rifle 23347
Strangulation 8110
Fire 6173
Suffocation 3968
Gun 2206
Drugs 1588
Drowning 1204
Explosives 537
Poison 454
Fall 190

Here you can see that the handgun is the most “favorite” weapon of serial killers compare to other weapons.

Murder Cases in USA

Here we wanted to visualize the highest crime counts in the US. Heatmaps are great when focusing on locations that matter the most. In this case, we see CA being red compare to other states. Also, in this heatmap, it shows how in Northern US there is less crime count.

Murder Cases in California and Texas

Now lets focus on the best state of the US, Texas. Unfortunately, Texas comes in second with the biggest crime rates. We wanted to see what county had the biggest crime rate. Harris county had the highest crime rate.

We wanted to include California since it has the highest among all other states to see where most of the murders are.
California is broken down into cities instead of counties.

We wanted to include California since it has the highest among all other states to see where most of the murders are.
California is broken down into cities instead of counties.

Genders of Victim by State, we count amount of cases based on data about genders by state.

Is there any missing or unknown data in this dataset?

#As an example, let's see how they show in gender of victims field!
data %>% group_by(Victim.Sex) %>% summarize(Gender = n())
## # A tibble: 3 × 2
##   Victim.Sex Gender
##   <chr>       <int>
## 1 Female     143345
## 2 Male       494125
## 3 Unknown       984

In this table as you can see, we have 984 that are unknown so we need to tidy up our data and get rid of the unknowns.

#How about Unknown Weapon?
data %>% group_by(Weapon) %>% 
  summarize(Most_Weapon_Used = n()) %>% 
  arrange(desc(Most_Weapon_Used)) %>% 
  kable() %>% kable_paper() %>% scroll_box(height = "800px")
Weapon Most_Weapon_Used
Handgun 317484
Knife 94962
Blunt Object 67337
Firearm 46980
Unknown 33192
Shotgun 30722
Rifle 23347
Strangulation 8110
Fire 6173
Suffocation 3968
Gun 2206
Drugs 1588
Drowning 1204
Explosives 537
Poison 454
Fall 190

Here you can see that the handgun is the most “favorite” weapon of serial killers compare to other weapons.

Let’s see how the distribution of cases by victims’ ages is based on data we have!

# Graph for cases by age
data %>% ggplot(aes(Victim.Age)) + geom_histogram(binwidth = 50) + 
  labs(title = "How many cases over victims' ages?", 
       x = "Age of Victim (years old)", y = "Cases")

From then, we see that there are many cases with nearly 1000 year-old victims. It doesn’t make sense so then we proceeded to filter our data to make it more neat and coherent.

Filtered Data

## Rows: 346,656
## Columns: 5
## $ Year         <int> 1980, 1980, 1980, 1980, 1980, 1980, 1980, 1980, 1980, 198…
## $ Victim.Age   <int> 14, 43, 43, 30, 42, 99, 20, 36, 31, 16, 33, 27, 33, 31, 2…
## $ Victim.Sex   <chr> "Male", "Male", "Male", "Male", "Female", "Female", "Male…
## $ Relationship <chr> "Acquaintance", "Acquaintance", "Acquaintance", "Acquaint…
## $ Weapon       <chr> "Blunt Object", "Strangulation", "Strangulation", "Rifle"…

In our filtered data we decided to work with data that we would find useful for our findings and remove all the ‘unknowns’ in the dataset. In our new filtered data we decided to work with Year, Victim’s Age, Victim’s Sex, Relationship to their perpetrator and type of Weapon used in each incident. We also filtered the victim’s age to be more accurate from combining them from age 1 to 100.

Now with our filtered data, we wanted to see the crimes rates throughout the years of 1980-2014. In 1980 & 1993, you can see that there is a peak in crimes rates but then they start to decreased. In 1980, the crime rate was high due to a severe global economic recession and inflation peaked in the US by 14.76%

# The rate of murders during the period of 1980-2014
filtered_data %>% 
  group_by(Year) %>% 
  summarise(murder = n()) %>% 
  ggplot(aes(Year,murder)) + geom_point() + geom_smooth()

2. Relationship between them:

Then we wanted to see if the perpetrator had knew their victim before striking. So we made a variable based on their relationship.

## Rows: 346,656
## Columns: 6
## $ Year                     <chr> "1980", "1980", "1980", "1980", "1980", "1980…
## $ Victim.Age               <chr> "14", "43", "43", "30", "42", "99", "20", "36…
## $ Victim.Sex               <chr> "Male", "Male", "Male", "Male", "Female", "Fe…
## $ Relationship             <chr> "Acquaintance", "Acquaintance", "Acquaintance…
## $ Weapon                   <chr> "Blunt Object", "Strangulation", "Strangulati…
## $ Relationship_with_murder <chr> "Known", "Known", "Known", "Known", "Known", …

Looking at this table, it there is a higher chance that the victim knows their perpetrator.

#Count how many cases they know each other
relationship_data %>% group_by(Relationship_with_murder) %>% summarise(cases = n())
## # A tibble: 2 × 2
##   Relationship_with_murder  cases
##   <chr>                     <int>
## 1 Known                    253368
## 2 Unknown                   93288

3. Gender:

We wanted to count the Victim Sex and see the graph so the data is filtered here and in my findings, there is a higher percent for a male to be murdered than a female.

##   Victim.Sex      n
## 1     Female  89362
## 2       Male 257294
# Graph for Victim Sex
filtered_data %>% ggplot(aes(Victim.Sex, fill = Victim.Sex)) + 
  geom_bar(color = 'black') + theme_bw() + 
  geom_text(aes(label = ..count..), stat = "count", vjust = 5) + 
  labs(title = "Which gender is the most targeted?", 
       x = "Victim Gender", y = "Cases", fill = "Victim Gender")
## Warning: The dot-dot notation (`..count..`) was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(count)` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

Then we wanted to see the correlation based on the victim’s gender and it’s perpetrator relationship.

# We wanted to see victim's gender and the correlation of their relationship?
relationship_data %>% ggplot(aes(Victim.Sex, fill = Relationship_with_murder)) + 
  geom_bar(color = 'black') + 
  theme_bw()+ 
  geom_text(aes(label = ..count..), stat = "count", vjust = 1) + 
  labs(title = "How many cases do they know each other by genders? ", 
       x = "Victim Gender", y = "Cases", fill = "Relationship with murder")

From above graph, it is obvious that most of victims know the murders before the incident

4. Age:

How does the distribution of cases look like by victim’s age?

filtered_data %>% ggplot(aes(Victim.Age)) + 
  geom_histogram(color = 'Black', fill = 'white', binwidth = 3) +
  labs(x = "Victim Age", y = "Cases") 

We could see that the average age of a victim to be most likely murdered are the ages 21-25. However, it is not really clear to determine if the age is an effected factor on the rate of murder cases. So, let’s take a look at this flow.

Summary

Based on our findings some important takeaway from our analysis is that the perpetrator gets acquainted with the victims before committing murder . The victims of females are more likely to be known by them. Also, men are more likely to be murdered than women. The underlying factors and motives of a serial killer is that they all may have different motives, where it can be in desperate need of money, sex, power, etc but they are prepare to kill again and again. Lastly, the most used weapon to kill was a handgun. It makes us wonder if gun law’s were regulated in each state, would it reduce crime since it is easily accessible to acquire one.